Issues in Large Vocabulary, Multilingual Speech Recognition
نویسندگان
چکیده
In this paper we report on our activities in multilingual, speaker-independent,large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Eu-rope, where each country has its own national language. Our existing recognizer for American English and French, has been ported to British English and German. It has been assessed in the context of the LRE SQALE project whose objective was to experiment with installing in Europe a multilingual evaluation paradigm for the assessment of large vocabulary, continuous speech recognition systems. The recognizer makes use of phone-based continuous density HMM for acoustic modeling and n-gram statistics estimated on newspaper texts for language modeling. The system has been evaluated on a dictation task with read, newspaper-based corpora, the ARPA Wall Street Journal corpus of American English, the WSJCAM0 corpus of British English, the BREF-Le Monde corpus of French and the PHONDAT-Frankfurter Rundschau corpus of German. Under closely matched conditions, the average word accuracy across all 4 languages is 85%, obtained with an open-vocabulary test and 20k trigram systems (64k system German).
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملAcoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context
The development of Large Vocabulary Continuous Speech Recognition systems involves issues as: Acoustic Phonetic Decoding, Language Modelling or the development of appropriated Language Resources. In the state of the art, new techniques for reusing Language Resources of more resourced related languages is becoming of great interest, and there is also a growing interest on Multilingual systems. T...
متن کاملMultilingual Speech Recognition for Information Retrieval in Indian Context
This paper analyzes various issues in building a HMM based multilingual speech recognizer for Indian languages. The system is originally designed for Hindi and Tamil languages and adapted to incorporate Indian accented English. Language-specific characteristics in speech recognition framework are highlighted. The recognizer is embedded in information retrieval applications and hence several iss...
متن کاملThe GlobalPhone Project: Multilingual LVCSR with JANUS-3
This paper describes our recent e ort in developing the GlobalPhone database for multilingual large vocabulary continuous speech recognition. In particular we present the current status of the GlobalPhone corpus containing high quality speech data for the 9 languages Arabic, Chinese, Croatic, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. We also discuss the JANUS-3 toolkit and ho...
متن کاملDevelopment of Multilingual Acoustic Models in the GlobalPhone Project
This paper describes our recent eeort in developing the Glob-alPhone recognizer for multilingual large vocabulary continuous speech. Turkish. Based on ve languages we developed a global phoneme set and built multilingual speech recognizer by variing the method of acoustic model combination. Context dependent phoneme models are created using questions about languages and language groups. Results...
متن کامل